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Abstract 

This paper revisits the classical inference results for profile quasi maxi- 
mum likelihood estimators (profile MLE) in the semiparametric estima- 
tion problem. We mainly focus on two prominent theorems: the Wilks 
phenomenon and Fisher expansion for the profile MLE are stated in 
a new fashion allowing finite samples and model misspecification. The 
method of study is also essentially different from the usual analysis of the 
semiparametric problem based on the notion of the hardest parametric 
submodel. Instead we apply the local bracketing and the upper function 
devices from Spokoiny (2012). This novel approach particularly allows 
to address the important issue of the effective target and nuisance dimen- 
sion and it does not involve any pilot estimator of the target parameter. 
The obtained nonasymptotic results are surprisingly sharp and yield the 
classical asymptotic statements including the asymptotic normality and 
efficiency of the profile MLE. The general results are specified to the 
important special cases of an i.i.d. sample. 
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1 Introduction 

Many statistical tasks can be viewed as problems of semiparametric estimation when the 
unknown data distribution is described by a high or infinite dimensional parameter while 
the target is of low dimension. Typical examples are provided by functional estimation, 
estimation of a function at a point, or simply by estimating a given subvector of the 
parameter vector. The classical statistical theory provides a general solution to this 
problem: estimate the full parameter vector by the maximum likelihood method and 
project the obtained estimate onto the target subspace. This approach is known as 
profile maximum likelihood and it appears to be semiparametrically efficient under some 
mild regularity conditions. We refer to the papers Murphy and Van der Vaart (2000, 
1999) and the book Kosorok (2005) for a detailed presentation of the modern state of the 
theory and further references. The famous Wilks result claims that the likelihood ratio 
test statistic in the semiparametric test problem is nearly chi-square with p degrees of 
freedom corresponding to the dimension of the target parameter. Various extensions of 
this result can be found e.g. in Fan et al. (2001); Fan and Huang (2005); Boucheron and 
Massart (2011); see also the references therein. 

This study revisits the problem of profile semiparametric estimation and addresses 
some new issues. The most important difference between our approach and the classical 
theory is a nonasymptotic character of our study. A finite sample analysis is particu- 
larly challenging because most of notions, methods and tools in the classical theory are 
formulated in the asymptotic setup with growing sample size. Only few finite sample 
general results are available; see e.g. the recent paper Boucheron and Massart (2011). 
The results of this paper explicitly describes all "small" terms in the expansion of the 
log-likelihood. This helps to carefully treat the question of applicability of the approach 
in different situations. A particularly important question is about the critical dimension 
of the target p and the full parameter dimension p* for which the main results are still 
accurate. Another issue addressed in this paper is the model misspecification. In many 
practical problems, it is unrealistic to expect that the model assumptions are exactly 
fulfilled, even if some rich nonpar ametric models are used. This means that the true 
data distribution IP does not belong to the considered parametric family. Applicability 
of the general semiparametric theory in such cases is questionable. An important feature 
of the new approach of Spokoiny (2012) is that it equally applies under a possible model 
misspecification. 

The mentioned issues, especially the non-asymptotic character of study dictate to 
change entirely the tools and methods of analysis. We apply the recent bracketing ap- 
proach of Spokoiny (2012) and demonstrate its power on the considered case of semi- 
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parametric estimation. Let Y denote the observed random data, and IP denote the 
data distribution. The parametric statistical model assumes that the unknown data 
distribution IP belongs to a given parametric family (IP V ) : 

y ~ip = ]p v > e (P v , v e r), 

where T is some high dimensional or even infinite dimensional parameter space. This 
paper concentrates on a finite dimensional setting, however, an extension to a functional 
space is feasible and to be considered elsewhere. The maximum likelihood approach in the 
parametric estimation suggests to estimate the whole parameter vector v by maximizing 
the corresponding log-likelihood &{v) = log ^^-(Y) for some dominating measure /x : 

v = f argmax£(t»). 

Our study admits a model misspecification IP ^ (iP„ , v £ T) . Equivalently, one can 
say that L(v) is the quasi log-likelihood function on T. The "target" value v* of the 
parameter v can defined by 

v* = argmax IE L(v). 

Under model misspecification, v* defines the best parametric fit to IP by the considered 
family. 

In the semiparametric framework, the target of analysis is only a low dimensional 
component 6 of the whole parameter v . This means that the target of estimation is 

o* = n v*, 

for some mapping TTo : T — > M p , and p G N stands for the dimension of the target. 

The profile maximum likelihood approach defines the estimator of 0* by projecting 
the obtained MLE v on the target space: 

= I7 v. 

The Gauss-Markov Theorem claims the efficiency of such procedures for linear Gaussian 
models and linear mapping IIq , and the famous Fisher result extends it in the asymptotic 
sense to the general situation under some regularity conditions. The Wilks phenomenon 
describes the limiting distribution of the likelihood ratio test statistic T : 



T = supL(v)— sup L(v). 
■ver ver 
n v=e* 



(1.1) 
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It appears that the distribution of this test statistic is nearly chi-square Xp as the samples 
size grows, Wilks (1938). In particular, this limiting behavior does not depend on the 
particular model structure and on the full dimension of the parameter v , only the 
dimension of the target matters. The full parameter dimension can be even infinite 
under some upper bounds on its total entropy. 

Below we consider a slightly different presentation of this estimator based on the 
partial optimization of the objective function £(v) for a fixed 6 . Namely, define 

L{6) d = max L(v). (1.2) 
n v=o 

Then the profile MLE can be defined as the point of maximum of L(0) : 

= argmax L(6) = argmax max L(v). 

n v=e 

The test statistic T from (1.1) is also called the semiparametric excess and it can be 
defined as 

1(0) - L(e*) = maxL(v) - max L(v). 

n v=o* 

The Wilks result can be rewritten as 

2{£(5) -£(**)} -^xj- 

The local asymptotic normality (LAN) approach by Le Cam leads to the most general 
setup in which the Wilks type results can be established. However, the classical theory of 
semiparametric estimation faces serious difficulties when the dimension of the nuisance 
parameter becomes large of infinite. The LAN property yields a local approximation of 
the log-likelihood of the full model by the log-likelihood of a linear Gaussian model, and 
this property is only validated in a root-n neighborhood of the true point. The non- and 
semiparametric cases require to consider larger neighborhoods where the LAN approach 
is not applicable any more. A proper extension of the Wilks result to the case of a growing 
or infinite nuisance dimension is quite challenging and involves special constructions like 
a pilot consistent estimator of the target, a hardest parametric submodel as well as some 
power tools of the empirical process theory; see Murphy and Van der Vaart (2000) or 
Kosorok (2005) for a comprehensive presentation. 

The recent paper Spokoiny (2012) offers a new look at the classical LAN theory. The 
basic idea is to replace the local approximation by local bracketing. Instead of one ap- 
proximating Gaussian log-likelihood, one builds two different quadratic processes such 
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that the original log-likelihood can be sandwiched between them up to a small error. 
It appears that the bracketing device can be applied for much larger neighborhoods 
than in the LAN approach. In this paper we show that the local bracketing approach of 
Spokoiny (2012) can be used for obtaining a version of the Wilks Theorem in a quite gen- 
eral semiparametric setup avoiding any special construction like "the hardest parametric 
submodel" . 

Another important issue is that the new approach does not rely on any pilot estimator 
of the target. The usual assumption that a consistent pilot estimator is available can be 
even misleading in our setup because it separates local and global considerations. This 
paper attempts to figure out a list of condition ensuring global concentration and local 
expansion at the same time. This particularly allows to address the crucial question of the 
largest dimensionality or the nuisance parameter for which the Wilks result still holds. 
It appears that the profile semiparametric approach is validated under the constraint 
p* 3 <C n , where p* is the full parameter dimension. It applies even if the dimension p 
of the target grows with the sample size under the mentioned constraint. The important 
identifiability issue is also addressed in a more careful way for the considered finite sample 
case. 

For the further presentation we have to briefly outline the basic results from Spokoiny 
(2012). Introduce the log-likelihood ratio process 

L(v,v*) = L(v) - L(v*). 

The key bracketing result of Spokoiny (2012) claims that L(v,v*) can be sandwiched 
on a local elliptic set T Q (r) around v* by two quadratic in v processes h e (v,v*) and 
L £ (v,v*): 

Le(u,v*)-0 £ (r)<£(«,t;*)<L 6 (t; > tJ*) + 6 (r), v £ T (r), (1.3) 

where <C>e(r) > and <C>e( r ) > are small terms. The value r here can be viewed 
as the radius of the set T (r) in the intrinsic semimetric corresponding to the process 
L(6) . See Section B for a precise formulation. This local result is accompanied with the 
deviation bound of the form 

P(v e T (r)) > 1 - e" x , 

where x grows almost linearly with r. The bracketing result (1.3) yields a number of 
important and informative corollaries. One of them shows that the excess L(v,v*) can 
be approximated by a quadratic form ||£|| 2 /2 , where £ = f Dq 1 X7L(v*) is the normalized 
score while Dq approximates the total Fisher information matrix. Another important 
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corollary of (1.3) is an expansion of the quasi MLE v . The mentioned results can be 
written in the form 

\2L(v,v*)-U\\ 2 \ < 2A £ , (1.4) 
\\D (v-v*) -£\\ 2 < 2A e , 

where A e is a random term called the spread which is small with a large probability. In a 
typical situation with a correctly specified model, £ is nearly standard normal and hence, 
2L(v,v*) is nearly Xp* > where p* is the full parameter dimension, while the MLE v 
is asymptotically normal and efficient. The expansion (1.4) helps to build likelihood- 
based confidence sets for the true parameter v* . Let Xa be the (1 — a) -quantile of the 
chi-square distribution with p* degrees of freedom. Set 

£(«) = {^T: 2£(v,v) < Xa }- 

Then (1.4) ensures that the coverage probability P(v* £(a)) is close to a provided 
that A e is sufficiently small. 

This paper aims at establishing a similar statements for the process L(9) from (1.2). 
In particular, the Wilks result can be written as 

1(9) - L{9*) ^ ||||| 2 /2, 

where the random p-vector \ satisfies JE^ = and iE||^|| 2 = p. The deviation proper- 
ties of ||£|| 2 resemble the ones of a chi-square random variable with p degrees of freedom 
just as in the Wilks phenomenon. The expansion of the profile MLE reads as 

D (d-6*) 

The symmetric matrix Dq G M pxp is usually called the influence matrix and it is the 
covariance of the efficient influence function; see Kosorok (2005). 

Usually in the classical semiparametric setup, the vector v is represented as v = 
(9,Tj) , where 6 is the target of analysis while rj is the nuisance parameter. We refer to 
this situation as (6, rj) -setup and our presentation follows this setting. An extension to 
the v -setup with 9 = IIqv is straightforward. Also for simplicity we only develop our 
results for the case that the full parameter space T is a subset of the Euclidean space of 
dimensionality p* . An extension to an infinite dimensional parameter space is possible 
but involves a range of technical issues that have to be done elsewhere. 

Section 2 introduces the objects and tools of the analysis and collects the main results 
including an extension of the Wilks Theorem, concentration properties of the profile 
estimator and the construction of confidence sets for the "true" parameter 9* . The 
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concentration properties of the profile MLE are discussed in Section D.l. The appendix 
collects the conditions and the proofs of the main results. 

2 Main results 

This section presents our main results on the semiparametric profile estimator which 
include the Wilks expansion of the profile maximum likelihood and the Fisher expansion 
of the profile MLE . All the results are stated under the same list of conditions that 
can be found in Section A of the appendix. As already mentioned, our setup follows 
Spokoiny (2012). However, at one point there is an essential difference. The results of 
Spokoiny (2012) are stated for just one fixed finite sample. The same continues to hold 
for the results below. But we are also interested in understanding what happens if the 
full dimension p* becomes large. For this we consider below an asymptotic setup with 
p* = p n , where n denotes the asymptotic parameter. It can be viewed as the sample 
size with n — > oo . We assume that all considered objects depend on n including the 
likelihood function, the full parameter set T and its dimension p* , as well as all the 
constants in our conditions. The primary goal of our study is to fix the necessary and 
sufficient conditions on growth of p n with n which ensures the Wilks and Fisher results. 

Our result apply even if the target parameter 9 is of growing dimension. The dimen- 
sion p can be of order p* . The case with a full dimensional target and low dimensional 
nuisance is also included. 

2.1 The Wilks and Fisher expansion 

This section states the key results in the semiparametric framework which heavily use 
the local bracketing idea of Spokoiny (2012). First we introduce the main elements of the 
bracketing device. This includes two p* x p* matrices Vq and T>q and two constants 
e = (5, g) . The matrix Vq describes the variability of the process £(v) around the true 
point v* : 

V 2 = Var{V£(^*)}. (2.1) 

The matrix Dq is defined similarly to the Fisher information matrix: 

V 2 d = -V 2 EL{v*). (2.2) 

Here and in what follows we implicitly assume that the log-likelihood function £(v) is 
sufficiently smooth in v , X7L(v) stands for the gradient and X7 2 lEL(v) for the Hessian 
of the expectation ]EL at v . It is worth mentioning that the matrices Dq and Vq 
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coincide if the model Y ~ P v * G (JPv) 1S correctly specified and sufficiently regular; see 
e.g. Ibragimov and Khas'minskij (198f). 

Now we switch to the (9, 77) -setup. Consider the block representation of the vector 
V d = VL(v*) and of the matrices from (2.1) and T>1 from (2.2): 

V - ( M D 2 - ( ° l A ° ) V*-( V ° B ° ) 
V "UJ' °~U *o 2 J' °"U Ql)- 

Define also the p x p matrix Dq and p -vectors V# and £ as 

b 2 = D 2 -A H^A^, 
X7 e = V e - A Ho 2 V v , 

In what follows, by C we denote a generic fixed constant. For all results presented 
below we assume a sufficiently large value x to be fixed. It determines our level of 
overwhelming probability: a generic random set J?(x) is of dominating probability if 

F(ft(x)) > 1 - Ce~ x . 

In the asymptotic setup with a growing sample size n the value x grows as well, x = 
x n — > 00 . We also suppose that a sufficiently large constant x is fixed which specifies 
random events i?(x) of dominating probability. Similarly to p* , the value x may depend 
on the asymptotic parameter n and grows to infinity with n . A particularly relevant 
choice is x = x n = C log n for a fixed C > . We only require that x n is not too large, 
more precisely, x < . sec (C.2) from Section C. In the i.i.d. setup x c is of order n ' . 

The other important value to be fixed is ro . This value determines the frontier 
between local and global consideration. In the local vicinity T (ro) of radius ro we apply 
a very accurate local quadratic approximation of the log-likelihood process while outside 
of this vicinity a much more rough upper function device can be used; see Section B 
for more details. The general rule for the choice of ro is given by the condition x\ > 
Co(p* + x ) f° r some specific constant Co • The quality of local quadratic approximation 
is measured by two functions 6(r) and w(r) shown in local conditions {S.D%) , (£0) of 
Section A. More exactly, it can be described by the quantities r e defined as 

r e d = <5(r ) + 3^ o 2 w(r ), (2.3) 

where the constants vq and are from conditions (£Di) and (X) in Section A. The 
sub-index e stands for the pair 5(tq), uj(tq) . Our results implicitly assume that r e is 
small. We comment on typical behavior of r e is Section 2.2 in context of i.i.d. models. 
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The first result can be viewed as an extension of the Wilks Theorem. 

Theorem 2.1. Assume (£2>o) , (£Di), (.Co) , (%) > (£ r ) and (£r) with b(r) = b ; see 
Section A. Let also r e from (2.3) fulfill r e < 1/2 . Then it holds on a random set f2(x) 
of dominating probability 

\2L(9,9*)-U\\ 2 \ <Cr e (p*+x), (2.4) 

Remark 2.1. In the case of the correct model specification with D 2 , = Vq , the deviation 
properties of the quadratic form ||£|| 2 = ||-Dq~v0|| 2 are essentially the same as of a chi- 
square random variable with p degrees of freedom; see Theorem C.l in the appendix. In 
the case of a possible model misspecification with T>q ^ Vq , the behavior of the quadratic 
form ||^ || 2 will depend on the characteristics of the matrix IB = !Dq 1 'VqDq 1 ; see again 
Theorem C.l. Moreover, in the asymptotic setup the vector £ is asymptotically standard 
normal; see Section 2.2 for the i.i.d. case. 

Remark 2.2. The partial maximum likelihood process L{0) can be used for defining 
the likelihood-based confidence sets of the form 

£( 3 ) = {0:L(0,0)< 3 } 

for some 3 > . The bound (2.4) helps to evaluate the coverage probability P(0* ^ £(3)) 
in terms of deviation probability for the quadratic form ||^|| 2 ; cf. Corollary 3.2 in 
Spokoiny (2012). 

The next result presents an expansion of the profile MLE . 

Theorem 2.2. Under the conditions of Theorem 2.1, it holds on a random set i?(x) of 
dominating probability 

||A)(0 - 0*)- ||| 2 <Cr e (p* + x). (2.5) 

Remark 2.3. One can use the expansion (2.5) for describing the concentration proba- 
bility for elliptic sets 

A(z) = {0:\\D (0-6*)\\<z}; 

cf. Corollary 3.5 in Spokoiny (2012). 

In the next section the result (2.5) is used to show asymptotic normality and efficiency 
of the profile estimator in the i.i.d. setting and under the correct model specification. 
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2.2 The i.i.d. case and asymptotic efficiency 

Here we briefly discuss the implications of our general results to the case with Y = 
(Yi, . . . , Y n ) T where observations Yj are i.i.d. from a measure P. The parametric 
assumption means P = P v * £ (P v ,v £ T) for a given parametric family (P v ) , where 
T is a subset of the Euclidean space M p . We assume that (P v ) obeys the regularity 
conditions listed in Section 5.1 of Spokoiny (2012). By £(y,v) we denote the log-density 
of P v w.r.t. some dominating measure /xo • For simplicity of comparison with the 
classical results we do not discuss the model misspecification issue, i.e. the parametric 
assumption is correct. However, an extension to the case of a misspecified model is 
straightforward. We utilize that Vq = Dq = n¥ , w(r) = uj*r/n 1 / 2 , <5(r) = 5*r/n 1 / 2 , 
and g = giy^; see Lemma 5.1 in Spokoiny (2012). Here F is the Fisher information 
matrix of the family (P v ) at the point v* , and uj* , 5* , and gi are some positive 
constants. 

It is shown in Spokoiny (2012) that the full parameter v* can be well estimated 
provided that p*/n is sufficiently small. More precisely, the concentration property for 
the set T (r) requires r 2 > Cp* for a fixed C , while the local bracketing device is 
validated up to the spread ^ e ( r ) which is of order p*5(r) x p*r/n 1 / 2 >c p* 3 ^ 2 /n 1 / 2 . 
The range of applicability for the proposed approach can be informally defined by the 
rule "the spread is smaller than the value of the problem" , where the value of the problem 
is understood as the expected excess. If the full parameter v is estimated, the value 
of the problem is of order p* leading to the constraint ll p*/n is small". If the target 
parameter is of dimension p , then the value of the problem is also of order p leading to 
the constraint "p* 3//2 /(n 1 / 2 p) is small". 

Now we specify the results in the (6, rj) semiparametric setup. To state the result we 
only need a version of the identifiability condition (I) on the marginal distribution. Let 
F be the Fisher information matrix of the family (P v ) at the true point v* . Consider 
its block representation 




The required identifiability condition reads as follows: 



(t) There is a constant p < 1 such that 



1/2 



F^F^F^F 0( 




(2.6) 



Also define 



F d = Fee - F^F^F, 
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The presented result admits that the full dimension p* grows with the sample size but 
slower than n 1//3 . The result is applicable even in the case when the target dimension 
also depends on the sample size. 

Theorem 2.3. Let Yi,...,Y n bei.i.d. JP V * and let (edo) , (ed\) , (£q) , (eu) , and (£u) 
with b(u) = b of Spokoiny (2012) hold. In addition, assume (t) ; see (2.6). Define for 
x = x n < n 1 / 3 

/3n = (^+X n ) 3 / 2 /- 1/2 . 

It holds on the a set J?(x n ) of dominating probability: 

|| (nF) 1 / 2 (0-0*) -If < Cf3 n , 
\2L(G,0*)- U\\ 2 \ <C/3 n . 

Moreover, the p -vector £ = f F _1 / 2 (Ve — F^F^V^) is asymptotically standard normal 
as n — > oo . This yields the asymptotic efficiency of the profile MLE . 

2.3 Critical dimension 

This section discusses the issue of a critical dimension. Namely we assume that the full 
dimension p* grows with the sample size n and write p* = p n . Theorem 2.3 requires 
that p n = o(n 1 / 3 ) . Here we show that this condition is critical for the class of models 
satisfying the conditions of Section A. Namely, we present an example in which the 
behavior of the profile MLE 6 heavily depends on the value f3 n = \Jp\jn > (3 > . If 
f3 n — > , then the conditions of Section A are satisfied yielding asymptotic efficiency of 
6 . At the same time, if (3 n > /3 > , then the MLE is not anymore root-n consistent. 
Assume that p n /y/n — > 0. Let a random vector X £ M p " follow X ~ 3sf(^*, n~ l I Pn ) . 
Take for simplicity v* = and let IP = IPq mean the distribution of X . Introduce a 
special set S C M Pn with 

§ d = j-j, = ( Vl ,...,v Pn ) : vi = |VAi/ n > 

nr„ (vW" + ^\//V«) • ( 2 -7) 

We denote by §,5 its 5 -vicinity: 

§S = {v: d(v,S)<6}, 

where d(v,§) is the Euclidean distance from the point v to the set S . Also S| stands 
for the complement of §,5 . Below we fix 5 = 1/n. Consider a special parametric quasi 
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log- likelihood ratio L(v,0) defined as 

£(>,0) = nX T v -n\\v\\ 2 /2 + nf(v)\\v\\ 3 . 
Here / : -ff? i— )• JR is a smooth function with 



f(v) 



l ve§, 



veS c 5 . 



def 

Below we consider the problem of estimating the first component = v\ E M . Since 
by assumption p n /yfn — > it holds for n large enough and for any v with \\v || 2 < 
^Pn/n + fi n /n that n||«|| 2 /2 > n/(r>)||r>|| 3 and thus 

argmax IE "£(v) = argmin|n||i>|| 2 /2 — n/(^)||^|| 3 } = 0. 

V V 

1/2 

It is easy to see that all conditions from Section A are satisfied with T e p n — Pn and 

D 2 = Vl = nl pn . 

Therefore, the results from Section 2.1 yield efficiency of the profile MLE 9 if p\jn — > . 
Moreover, it is straightforward to see that 

b Q = s/n, V(£ - IEL) = V e (L - IEL) = nX u and £ = y/nX x . 

It follows similarly to Theorem 2.1 that if /3 2 = p\jn — > then 

\\D (9-9*) - ||| = Vn|£i - Xi\ -)• 0. 



The next result shows that in the case when fi n = \Jp\jn is not small, the profile MLE 
6 is not root-n consistent. 

Theorem 2.4. Suppose that j3 n — > (6c) 2 for some c > . Let also n be large enough 
to ensure 

2 1/3 - 1 i—r- . i . , ,3/4 

2l/6 VPn/n > - {Pn/n) . 
There exists a positive a > such that it holds with a probability exceeding a 

\\D (0 _ o*) _ ||| > 1^/2 _ _L > c _ 0n (i). 
6 v/n 



If (3 n ^ co , then 



\b Q (e-e*)-l\\^+oo, 



where — )• means convergence in probability. 
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A Appendix 



The appendix collects our conditions and proofs of the main results. 

We adopt the conditions from Section 2 of Spokoiny (2012) with the obvious change 
of notations. The local conditions only describe the properties of the process £(f ) for 
v € T (ro) with some fixed value ro . The global conditions have to be fulfilled on the 
whole T . We start with the local conditions. 

(£Do) There exists a constant uq > , a positive symmetric p* x p* matrix Vq satis- 
fying Var{VC(t>*)} < Vq , and a constant g > such that for all |/i| < g 

, „ f (VC(0,7>1 ^nV 
sup log iE exp <^ sv . h . " \ < 0P 



-y£lRP 



||Vo7l 



(£Di) For all < r < ro , there exists a constant w(r) < 1/2 such that for all 
v £ T Q (r) and < g 

f <7,VC(«)-VC(^)) ) . " V 
sup log it? exp < n . . .... — > < 



u(t)\\V o1 \ 



(Zjo) There exists a symmetric p* x p* -matrix Dq such that such that it holds on the 
set T (ro) for all r < tq 



VE£(v,v*) - T>l(v - v*) 



\\T>o(v - v* 

This condition together with the identity X7]EL(v*) = implies 

-21EL(v,v*) 



\T>o(v - v 



*MI2 



< 5(r). 



The global conditions are: 



(£r) For any r > ro there exists a value b(r) > , such that 

\\V (v-v*W ~ b(r) ' 
(£r) For any r > ro there exists a constant v$ > and a constant g(r) > such that 

r (vc(^),7) l . «/ V 

sup sup sup log iE exp <^ /i — — — n — ) < ——. 

«er (r) M < g (r) -reiRp I ll v o7ll J 2 



Our results are stated for g(r) = g > , however, an extension to the case g(r) — > 
can be made similarly to Spokoiny (2012). 
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Finally we specify the regularity conditions. We begin by representing the information 
and the covariance matrices in block form: 

T,2 _ ( D « A °\ V*-( V « B " \ 

° = U Ml)' V ° = Uo T Qlj- 

The identifiability conditions in Spokoiny (2012) ensure that the matrix Do is positive 
and satisfies o 2 Dg > Vg for some a > . Here we restate these conditions in the special 
block form which is specific for the (9, 77) -setup. 

(X) There are constants a > and p < 1 such that 

a 2 D 2 >V 2 , a 2 H 2 >Q 2 , a 2 V 2 >V 2 . (A.l) 

and 

WDq 1 A Q Ho 2 D^ 1 Woo < P- (A.2) 

The quantity p bounds the angle between the target and nuisance subspaces in the 
tangent space. The regularity condition (I) ensures that this angle is not too small and 
hence, the target and nuisance parameters are identifiable. In particular, the matrix Dq 
is well posed under X . 

The bounds in (A.l) are given with the same constant only for simplifying the 
notation. One can show that the last bound on Dg follows from the first two and (A.2) 
with another constant a' depending on a and p only. 

B Bracketing and upper function devices 

This section briefly overviews the main constructions of Spokoiny (2012) including the 
bracketing bound and the upper function results. The bracketing bound describes the 
quality of quadratic approximation of the log-likelihood process £(v) in a local vicinity 
of the point v* , while the upper function method is used to show that the full MLE v 
belongs to this vicinity with a dominating probability. Given r > , define the local set 

r o (r) = f {v : (v- v*) T V 2 (v - v*) < r 2 }. 

For e = (5, g) , define the bracketing quadratic processes L c (v, v*) and L c (u, v*) : 

L e (v,v*) d ^ (v-v*) T V£(v*)-\\V e (v-v*)f/2, 

V 2 e d ^V 2 (l-S)-gVl (B.l) 
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and accordingly for e = — e = (—5,—g). The next result restates the local bracketing 
bound of Spokoiny (2012) in the semiparametric framework. The imposed conditions and 
the involved constants vq , S(r) , and u)(r) are explained in Section A. The presented 
results implicitly assume that p* is large, x is large as well to ensure that e~ x is 
negligible. A proper choice is x = Cp* for a fixed C . 

Theorem B.l (Spokoiny (2012), Theorem 3.1). Assume (£Di) and (Lq) . Let for some 
r , the values g > 3z^o w(r) and S > 5(r) be such that CD 2 (1 — 5) — qVq > . Then 

he(v,v*) - £ (r) < L(v,v*) < h e (v,v*) + <> 6 (r), v G T (r), 

where the random variables O e (r),{>g(r) fulfill on a random set f2(x) of dominating 
probability 

6 (r) < C^(p*+x), <> £ (r) < Cg(p*+x). (B.2) 
In fact, Theorem 3.1 of Spokoiny (2012) states the following bound: 
F{e _1 £ (r) > 3(Q,x)} < exp(-x). 

with Q = 2Ap* and 

f (1 + v^) 2 ifl + v^+Q<^ 

1 ( 1 + f( x + Q) + 2^} 2 otherwise. 

Under the assumption that g is sufficiently large, that is, g/z^o 3> P* > w e can apply 
3(Q, x) x + Q < C(p* + x) , and the result of Theorem B.l follows. 

The bracketing result of Theorem B.l is local in the sense that it only applies for 
v G T (r) . Following to the general approach of Spokoiny (2012) we accompany it with 
the large deviation bound on the concentration probability ]P(v G T (x)) when the local 
radius r exceeds some level ro which has to be sufficiently large, namely Iq > Cp* . We 
adopt the upper function approach from Spokoiny (2012); cf. Corollary 4.4 therein. 
Again the constants g(r) and b(r) are introduced in Section A. 

Theorem B.2 (Spokoiny (2012), Theorem 4.1). Suppose (£r) and (£r) with b(r) = b . 
If for r > ro , the following conditions are fulfilled: 



1 + y/x + Q < 3^ 2 g(r)/b, 
6i/ V x + Q < rb > 



(B.3) 
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then v £ T'o(ro) on a random set J7(x) of dominating probability. The same bound holds 
for the probability Vq* € T (tq) where Vq* maximizes L(v,v*) subject to LTqV = 0* : 

~ def . , * » 

vq* = argmax,L(u, v ). 
ver 
n v=e* 

Remark B.l. The condition (B.3) helps to understand which ro ensures prescribed 
concentration properties of v and Vg* . Namely, if g(r) is large enough, then (B.3) 
follows from the bound 



r > 6b~VoVx + Q- 

C Deviation bounds for quadratic forms 

The following general result from Spokoiny (2013) helps to control the deviation for 
quadratic forms of type ||iB£|| 2 for a given positive matrix IB and a random vector £ . 
It will be used several times in our proofs. Suppose that 

logJ£exp( 7 T £) < || 7 || 2 /2, 7 G H P , IMI < g- (CI) 

For a symmetric matrix IB , define 

p = tr( iB 2 ), v 2 = 2tr(iB 4 ), A* d ^ f WB^U d ^ A max (iB 2 ). 

We suppose that A* < 1 , otherwise we should replace everywhere IB with IB/X* . 
Let g be shown in (C.l). Define oj c by the equation 

uj c (1 +u c ) _ _ 1/2 

(l+^l/2-SP • 

Define also fi c = w 2 /(1 + cj 2 )A2/3. Note that w 2 > 2 implies /i c = 2/3 . Further define 

y 2 = (1 + w 2 )p, 2x c = ^ c y 2 + logdet{/ p - fi c IB 2 }. (C.2) 

Theorem C.l (Spokoiny (2013)). Let £ fulfill (C.l) with g 2 > 2p . Then we have for 
x < x c with x c from (C.2): 

P{\\IB€\\ 2 >z(x,IB)) < 2e- x + 8.4e" Xc , 

P + 2VX 1 / 2 , x < v/18, 



3(x,iB) d ^ f 



p + 6x v/18 < x < x c . 



For x > x c 

P{\\mf > 3c (x, IB)) < 8.4e~ x , 3c (x, B) A ^ |y c + 2(x - x c )/g 
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It appears that the bound is slightly different in two zones separated by some specific 
value x c from (C.2). It is large in typical situations as x c = g (it is of order y/n in the 
i.i.d. case). For x < x c , we obtain the same type of bounds as in the Gaussian case, for 
x > x c they are a bit worse. 

D Proofs 

This section collects the proofs of the results in chronological order. 

D.l Proof of Theorem 2.1 

Define the m x m matrices H\ and H\ by 

Hi = H 2 (l -5)- Q Ql Hi = H 2 (l + 5) + Q Q 2 ; 

cf. (B.l). Below we fix some constant r which is assumed to be large enough for ensuring 
the dominating probability for the concentration event C £ (r) defined as 

C e (r) d ^ { ||V (5 - v*)\\ < r, \\V o (v * - v*)\\ < r, 

(D.l) 

||V D- 2 V|| < r, ||g fl^ 2 V„|| <r}. 

Note that the conditions ||Vo(t? — v*)|| < r and ||Vo(C * — v*)\\ < r can be represented 
as {v £ T D (r)} and {vg* £ T (r)} . Similar representation holds for 

v e = T)~ 2 V = argminL e (t;, v*), 

V 

rj e = f H~ 2 Vri = argminL e (i?, v*). 
ver 
n v=e* 

For instance, { ||V X>^ 2 V|| < r} = {v^ G T (r)} . Later we show that a proper choice of 
r ensures a dominating probability of the random set C e (r) ; see Section D.l. 
We first show that the bound (2.4) is fulfilled on the set C e (r) from (D.l) with 

At(r) = <>.(r) + e (r) + -^p'Wf + -^\\H^V V \\ 2 , (D.2) 

A-(t) = <> 6 (r) + Oe(r) + t^-IIDo Vf + -^-\\H^V v f . (D.3) 

In analogy with Spokoiny (2012), the quantity A e (r) with 

A e (r) = A+(r) + A-(T) 

= 2<> e (r) + 20 £ (r) + (po^f + ll#o _1 Vr,|| 2 ) , (D.4) 
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can be called the semiparametric spread. It can be seen as a payment for the bracketing 
device. Below we show that A e {r) < Cr e (p* + x) with a dominating probability. 

We start with some technical results about the maximum of the upper and lower 
quadratic processes L, e (v,v*) and h e (v,v*) . Remind the notation V = f V£(w*). 

Lemma D.l. It holds 

supL c (u,u*) = l^vf, supLe(u,u*) = hv^VW 2 , (D.5) 

v Z v Z 

where sup„ means the maximum over all v £ 1RP . Moreover, on the random set 
{||V D ' 2 V|| < r} it holds 

sup h^(v,v*) = supLe(u, v*) = ■i||D~ 1 V|| 2 . 

vGTo(r) V 2 - 

Proof. The identity (D.5) directly follows by maximizing the quadratic expression 

L 6 (v,t>*) = (v- v*) T V - \\V e (v - v*)\\ 2 /2, 

with the maximum at v = v* + D~ 2 V . Similarly, the maximum of L e (t;, v*) is achieved 
at v = v* + Dj 2 V S C e (r) which is within T (r) under the condition 

||V ^ 2 V|| < ||V D 2 V|| < r. 

This yields the claim. □ 

The next lemma states similar results for the constrained maximum of L e and L e 
subject to IJqv = 9* . The proof is the same as for Lemma D.l. Remember the notation 
= f V ' qL{v*) , X7 V d = X7 v L(v*) . We also use the block representation of Dq : 



2 [ Dl A 



A T H 2 



Lemma D.2. It holds 



sup lL e (^,^*) = iHi^-^^H 2 , (D.7) 
v.nov=8* 2 



Moreover, it holds on the random set {\\QqHq 2 V,j|| < r} 



sup Le(t>,v*) = sup Le(v, v*) = -\\H e 1 V ?J || 2 . 

weTo(r): II v=0* v: II v=0* 2 
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Further, define the process 

h( v ,v*) = (v- v*) T V - \\T> (v - v*)f/2. 

Remember the definition of Vg and Z)q : 

Vo = V e - AoH^Vr,, 
Dq = f D\ - A Hq 2 Aq. 

Lemma D.3. It holds on the random set { ||Vo!Dq 2 V|| < r, ||<5o-ffo" 2 V r? || < r} 

supL(u,u*) = sup L(u,u*) = -HD^Vf 2 

«GT (r) 2 



sup L,(v,v*) = sup L,(v, v*) = — \\H 1 V t? || 2 , 
uer (r): n v=9* w.n v=o* 2 

su P L(w,«*)- sup ^{v^*) = \\\b^ l V e \\ 2 . (D.8) 

Proof. First consider the adaptive cases with Aq = yielding Dq = Dq and V# = Vq . 
Then the process ~L(v,v*) can be decomposed as 

h(v,v*) = (e-o*) T Ve-\\\D a {e-e*)\\ 2 

+ (v-V*) T Vr,-l\\H (v-ri*)f, 

and the partial optimization subject to = 0* yields the results (D.6) and (D.7). Note 
that the constrained maximum is attained at r] = rj* + H^Vrj . 

The general case can be reduced to the adaptive one by the change of variable. With 
7 = f rj — rj* + Hq 2 Aq(0 — 0*) , one can represent L(v, v*) in the form 

L(i>, u*) = (0 - 0*) T V - || A>(« - 6>*)|| 2 /2 + 7 T V T? - ||F 7l| 2 /2, 

which corresponds to the decomposition in the adaptive case. □ 

On the random set {v £ T (r),Vg* G ^o(r)} , it holds 

L(0,0*) = L(0) - L(0*) = sup £(«,«*)- sup L(v,v*). 

i)eTo(r) u6r o (r):ITol>=0* 

Theorem B.l implies 

sup L £ (u,u*) - Oe(r) < sup £(u,u*)< sup L 6 (w,u*) + Oe(r). (D.9) 

uer„(r) weTo(r) uer (r) 
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The same bound applies to the maximum taken over the subset {v £ T q {t) : IJqv = 0*} . 
By Lemmas D.l and D.2, on the random set C e (r) , one can replace the sup of L e (t;, v*) 
over T Q (r) by the sup over the whole vector space JR P . Putting all the obtained bounds 
together yields 



Define 



L(0,0*) > supLe(v,v*) - sup L 6 (u,u*)-O e (r)-0 £ (r), 

V v:ll a v=0* 

1(0,6*) < sup L e (w,v*)- sup Le(u,i;*) + ^e(r) + £ (r). 

v v.n v=9* 



□ e = f supL e (i;, v*) — suph(v,v* 
□e = f supL(i;, v*) — supLg(i;, v* 



Lemmas D.l implies 



2D € = HK^vf - llD^VH 2 , 
2De = ||^>o lv l| 2 - II^I^II 2 - 



Define now 



I oo 



The regularity conditions (X) a 2 Dp > Vq implies for D 2 = 2)q(1 - (5) - gV% 

V 2 (l - r e ) < V 2 e < 

T>1< Vl<T>l{l + T e ). 

with r e = 5 + ga~ 2 so that the quantities a e and a e satisfy 

1 r e 1 r e 

a F < 1 = , a F < 1 = . 

1-t. 1-r ' l + r e l + r e 



This yields 



2D e < a^lDo^H 2 , 2D, < a.jlDoVII 2 . 



(D.10) 
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Similarly by using the result of Lemma D.2 

= sup ~L e (v,v*) — sup h(v,v*) 

v. 77od=0* v. Ilov=0* 

= ^(ll^v.ll 2 - ll^v.ll 2 ) < fll^v.ll 2 , 

□ e 5 l = f SUp h(v,V*) — Slip hJyV,V*) 
v. I1qv=6* v. 77of =6* 

= ^(ll^v.ll 2 - ll^v.ll 2 ) < f ll^v.f. 

Further, (D.10) and (D.8) yield 

1(0,0*) > lllD^VeW 2 -a.-a £A -O e (r)-0.(r), 

tfro*) < ilpo^ef + n e + n & i + <>6(r) + £ (r). 

The proof of (D.2) and (D.3) is completed. 

The next step is to bound the spread A e {r) from (D.4). The error terms <0 > e( r ) an d 
e (r) follow the bound (B.2) of Theorem B.l and they are of order g(p* +x) . Further we 
have to show that T" e ||D ( ^" 1 'V|| 2 is small relative to ||£|| 2 and similarly for T e ||.fZj~ 1 V T j|| 2 . 
Theorem C.l provides a general deviation probability bound for such quadratic forms. 
In particular, for IB = f T>q 1 VqDq 1 and x < x c 

iP(||T>o 1 V|| 2 > 3(x,iB)) < 2e~ x + 8.4e" Xc , 

where 3(x,iB) < tr(iB) + 6x and the constant x c is large; see Section C for a precise 
formulation. Under the regularity condition (X) it holds tr(iB) < a 2 p* . A similar 
bound holds for ||i7 ( J~ 1 V,j|| 2 . We conclude that the spread A e (r) can be bounded with 
a probability of order 1 — e _x by Cr e (p* + x) for a fixed constant C . 

Further we have to show that the random set C e (r) from (D.l) is of dominating 
probability if r 2 = C(p* + x) for a proper constant C . By definition 

C e (r) = {ve T (r), vg. e r (r), ||V D- 2 V|| < r, \\Q H. 2 V V \\ < r}. 

Theorem B.2 yields 

P{v T (r)} + P{v e * i T (r)} < 2e" x . 
To control the probability P(\\ VqT>~ 2 \/\\ > r) we apply Corollary C.l with 

B = Do 1 "^Do 1 , • 
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With the definitions from Section C 

JP(||V !D, 2 V|| > r) < PiWD^VW x HVqDJ 1 ^ > r) 

< P(\\T>q 1 V|| x ||V X>o 'Hoc > (1 - r 6 )r) 

< J'dlDo'Vll^Cl-reJr/o) 

< iPjU^VH 2 > 3 (x,iB)} 

< e- x + 8.4e" Xc , 

provided that r 2 > a 4 (l — T e )~ 2 (p* + 6x) and x < x c . By similar arguments with 
B v = H^QIHq 1 in place of B 

P(\\Q Hz 2 Ve\\ > r) < e~ x + 8.4e" x? . 

Putting the obtained bounds together shows that for x < x c and rg > C±(p* + x) , it 
holds 

1 - F(C 6 (r )) < C 2 e" x , 

for some fixed constants Ci and C 2 depending on r e and o only. This completes the 
proof. 

D.2 Proof of Theorem 2.2 

We show that 

2 ^* (r) def 2^(r) _, 2iE — II D-i vll. (D.ll) 

£V ' 1-T € 1-tJ ° 11 V ' 

First we derive the expansion for the whole parameter vector v . On the set C e (r) , the 
bracketing bound (D.9) and (D.5) imply 

The bracketing bound (D.9) applied at v implies 

L(v,v*) < h e (v,v*) + O e (r). 
These two bounds together yield by the definition of ~h e (v,v*) 

(3 _ „*)T V _ l -p e {v - v*)\\ 2 > ill^Vf - e (r) - 0«(r) - De - De, 
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and thus 

||D 6 (t5 - v*) - V-'VW 2 < 2{D e + + <> 6 (r) + £ (r)} < 2A e . (D.12) 
The condition (I) implies the inequality ||Dq 1 D^!Dq 1 > 1 — r e and hence, 

\\'Do'D- 2 T> \\ oo <(l-T e )- 1 . 

This and (D.12) provide 

|to (e-^)-Do^ 2 V|| 2 < 



l-r e 



Similarly 



I'D 



v- 2 v - DqVH = \\{v v- 2 v - i r )v^v\\ < -^\\v^W\\. 



Putting together the last two bounds yields 



p (v-v*)-n^v\\ < ^ 3 ^_ + r 5L_||a)o 1 v||. 

It remains to note that for any u £ MP , 77 G M m , and «> = (u, rj) £ M p , it holds with 
~Y = r} + Hq 2 AJu e M m 

\\V w\\ 2 = \\D u\\ 2 + ||tfo 7 || 2 > \\D u\\ 2 . (D.13) 
Also we use 77 2?o 2 V = Dq 2 V .This implies for w = v — v* by (D.13) 

||A)(0 - o*) - b^v\\ = \\b (e -e*- bfv)\\ < ||D (tu - d^ 2 v)||, 

and the assertion (D.ll) follows. 
D.3 Proof of Theorem 2.3 

Choose x n — > oo and x n = o{n 1 /^) , e.g. x n = Clog(n) . Then j3 n — > and iP(/2(x n )) — )• 
1 . Moreover, in the i.i.d. setting x c = g = -y/n and thus x n < x c . Similarly for n large 
enough with r 2 , = r^(x n ) = C(p* + x n ) 

r e ^ r /v^ = V(P* + X «V™ < n " V3 < V 2 - 
Also the i.i.d. structure of the data yields 

Dl = n¥. 
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Now Theorems 2.1 and 2.2 can be applied yielding the first statement of the theorem. It 
remains to check asymptotic standard normality of the sum 

n 

i=i 

with 

Vj d = V e l{Y u v) - F^F^V^.u). 
The result follows from the central limit theorem because Cov(Vj) = F for all i . 

D.4 Proof of Theorem 2.4 

The first step of the proof shows that for n large enough, the MLE v G M Pn belongs with 
probability close to one to the 5 = 1/n vicinity §>$ of the set S from (2.7). The second 
step is to show that with a probability exceeding a fixed constant a > , the profile 
MLE 6 differs significantly from X\ which is the profile MLE in the linear Gaussian 
model. The third step focuses on the case /3 n — > oo . 

1. First we show that for n large enough, the MLE v G M Pn lies in §5 with 
probability close to one. For this we check that the maximum of L(v ) on is smaller 
than a similar maximum on S for "typical" values of X and n large enough. Indeed, 
for any point v G S| 

£(v, 0) < max£(u, 0) = max{?7,X T ^ — n||^|| 2 /2) 

„.rCC v 7 ..rCC ^ II II / J 



< max \nX T v - n\\v\\ 2 /2\ = -||X|| 2 . 
Further, introduce a random set of "typical" values X : 

It is straightforward to see that iP(X G Ci) is exponentially close to one for n large. 
Below we assume that X G C\ and study the value L(v,0) for dE§. Let also n be 
large enough to ensure that 



>o - =o\/A>- (D.14) 



2V6 \nJ ~ 2\n J 2 

Introduce X§ as the closest point in S to X with > \Xi \ . This point always exists 
by the definition of S . Denote 

5(X) = \\X-Xs\\ = |Xi - Vl \. 



ANDREAS ANDRESEN AND VLADIMIR SPOKOINY 



25 



By construction of S , it holds S(X) < 0.5-v/ f3 n /n for X G C\ . For n satisfying (D.14) 
this also yields [||X|| - 5(X)] 3 > 1/2||X|| 3 . Now we have for X G C\ 



max£O,0) > £(X S ,0) 

> n\\X\\ 2 - n\Xi\6(X) - ^{\\X\\ 2 - 2\X 1 \S(X) + 5 2 (X)} 

+ n{||X|| 2 - 2\Xi\8(X) + 5 2 (X)} 3/2 

> ^\\X\\ 2 -n5 2 (X) + n{\\X\\-5(X)} 3 

> ™\\ X f -\\Xf > -||X|| 2 = max£(v,0). 
2 II II 4 2 ii n 2 11 11 ve gc v ) 

This implies v G S<j . 

2. Now we discuss the case when /3 2 = p^/n — > (6c) 4 for some c > and show that 
the profile MLE 9 deviates significantly from X\ on a random set of positive probability 
Define for each n G N 

c n = CinjpT-XgH > Jrv 7 ^} =Cin||X 1 -X Sil | > ^v 7 ^}- 

It is easy to see that P{C n ) > a for some fixed a > and all n . It remains to note 
that on the set C n it holds under (D.14) 

||A)(0-0*)-£|| = v^i-^il 

> \fn\Xi — X§^\ — y/n/n 

1 1/2 1 I oo p^/n oo, 



6 ( c p 3 /n ^ (6c) 4 ; 

which yields the claim. 

3. Finally consider the case when /3 n — > oo . Fix any sequence c n such that c n — > 

— 1/2 

and c n /? n — >■ oo , e.g. c n = j3 n . Consider the random set 

c n = a n {pr - x§|| > ^^/K/^} = Ci n - x Sil | > ^ . 

Then P{C n ) -> 1 and on C n 

\\ih>(B-P)-i\\ > >y 2 -^^oo, 



as required. 
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